Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues

نویسندگان

Torsten Hoefler

William Gropp

Rajeev Thakur

Jesper Larsson Träff

چکیده

Designing and tuning parallel applications with MPI, particularly at large scale, requires understanding the performance implications of different choices of algorithms and implementation options. Which algorithm is better depends in part on the performance of the different possible communication approaches, which in turn can depend on both the system hardware and the MPI implementation. In the absence of detailed performance models for different MPI implementations, application developers often must select methods and tune codes without the means to realistically estimate the achievable performance and rationally defend their choices. In this paper, we advocate the construction of more useful performance models that take into account limitations on network-injection rates and effective bisection bandwidth. Since collective communication plays a crucial role in enabling scalability, we also provide analytical models for scalability of collective communication algorithms, such as broadcast, allreduce, and all-to-all. We apply these models to an IBM Blue Gene/P system and compare the analytical performance estimates with experimentally measured values.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems

Energy consumption is a major concern with high performance multicore systems. In this paper, we explore the energy consumption and performance (execution time) characteristics of different parallel implementations of scientific applications. In particular, the experiments focus on message-passing interface (MPI)-only versus hybrid MPI/OpenMP implementations for hybrid NAS (NASA Advanced Superc...

متن کامل

Implications of application usage characteristics for collective communication offload

The performance of collective communication operations is known to have a significant impact on the scalability of some applications. Indeed, the global, synchronous nature of some collective operations directly implies that they will become the bottleneck when scaling to hundreds of thousands of nodes. This fact has led many researchers to try to improve the efficiency of collective operations...

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

Asynchronous MPI for the Masses

We present a simple library which equips MPI implementations with truly asynchronous non-blocking point-to-point operations, and which is independent of the underlying communication infrastructure. It utilizes the MPI profiling interface (PMPI) and the MPI_THREAD_MULTIPLE thread compatibility level, and works with current versions of Intel MPI, Open MPI, MPICH2, MVAPICH2, Cray MPI, and IBM MPI....

متن کامل

Comparison of MPI Implementations on a Shared Memory Machine

There are several alternative MPI implementations available to parallel application developers. LAM MPI and MPICH are the most common. System vendors also provide their own implementations of MPI. Each version of MPI has options that can be tuned to best t the characteristics of the application and platform. The parallel application developer needs to know which implementation and options are b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues

نویسندگان

چکیده

منابع مشابه

Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems

Implications of application usage characteristics for collective communication offload

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Asynchronous MPI for the Masses

Comparison of MPI Implementations on a Shared Memory Machine

عنوان ژورنال:

اشتراک گذاری